Post-Editing Through Approximation and Global Correction

نویسندگان

  • Kazem Taghva
  • Julie Borsack
  • Bryan Bullard
  • Allen Condit
چکیده

This paper describes a new automatic spelling correction program to deal with OCR generated errors. The method used here is based on three principles: 1. Approximate string matching between the misspellings and the terms occuring in the database as opposed to the entire dictionary 2. Local information obtained from the individual documents 3. The use of a confusion matrix, which contains information inherently specific to the nature of errors caused by the particular OCR device This system is then utilized to process approximately 10,000 pages of OCR generated documents. Among the misspellings discovered by this algorithm, about 87% were corrected. ∗Email: [email protected].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Qualitative Analysis of Post-Editing for High Quality Machine Translation

In the context of massive adoption of Machine Translation (MT) by human localization services in Post-Editing (PE) workflows, we analyze the activity of post-editing high quality translations through a novel PE analysis methodology. We define and introduce a new unit for evaluating post-editing effort based on Post-Editing Action (PEA) for which we provide human evaluation guidelines and propos...

متن کامل

Compressional Stability Behavior of Composite Plates with Multiple Through-the-Width Delaminations

In this paper, the compressive behavior of composite laminates with multiple through-the-width delaminations is investigated analytically. The analytical method is based on the CLPT theory and its formulation is developed on the basis of the Rayleigh-Ritz approximation technique to analyze the buckling and post-buckling behavior of the delaminated laminates. The method can handle both local buc...

متن کامل

Critique of Manuscript-Correction/ The Role of Editors in Presenting the Author: A review of Toghray Mashhadi's biography in his newly published Book of Essays, Fatima Mehri

The Role of Editors in Presenting the Author  A Review of Toghray Mashhadi's Biography in His Newly Published Book of Essays  Fatemeh Mehri Associate Professor of Persian Language and Literature, Shahid Beheshti University  [email protected]   Abstract Researchers in the field of editing and correction manuscripts consider the writing of introductions as part of the correction process. T...

متن کامل

eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing

Training models for the automatic correction of machine-translated text usually relies on data consisting of (source, MT, human postedit) triplets providing, for each source sentence, examples of translation errors with the corresponding corrections made by a human post-editor. Ideally, a large amount of data of this kind should allow the model to learn reliable correction patterns and effectiv...

متن کامل

Testing “Prompt”: The Development of a Rapid Post-Editing Service at CLS Corporate Language Services AG, Switzerland

CLS Corporate Language Services AG recently began offering the rapid post-editing of raw machine translation output to meet the rising demand for this service among clients. What is meant by rapid post-editing is the rough correction of machine translated texts with emphasis on speed and denotative accuracy. In the preliminary phase of the project, CLS conducted a test among four inhouse transl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJPRAI

دوره 9  شماره 

صفحات  -

تاریخ انتشار 1995